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cAMP-induced switching in turning 
direction of nerve growth cones 

Hong-jun Song, Guo-li Ming & Mu-ming Poo 

Nature 388, 275-279 (1997) 

The order of panels in Fig. 3 of this Letter is incorrect as published. 
Figure 3a-e should be labelled as f-j, and Fig. 31' i should be 
labelled a-e. □ 

corrections 

Synthesis and X-ray structure of 
dumb-bell-shaped C 12 o 



Nature 387-583-586 ( 1997) 

In this tetter, we overlooked a citation of G. Oszlanyi et ah, Phys. 
Rev. B 54, 11849 (1996), who reported the observation of covalently 
bound (C 60 );r dianions from the X-ray powder diffraction patterns 
of the metastable phases of KC 60 and RbC 60 . □ 



errata 



The yeast genome directory 

Nature 387 (suppl.) (1997) 

In the list of authors given on page 5 of this supplement, the names 
of some authors were omitted or misspelled (asterisks). These were: 
R. Altmann; W. Arnold*; M. de Haan*; K. Hamberg; K. Hinni; 
L. lones; W. Kramer; H. Kiister*; K. C. T. Maurer*; D. Niblett; 
N, Paricio*; A. G. Parle-McDermott* ; C. Rebischung; C. Richards; 
L.: Rifkin*; I. Robben; C. Rodrigues-Pousada*; I. Schaaff- 
Gerstenschlager*; P. H. M. Smits*; Y. Su*; Q. ]. M. van der Aart*; 
I. C. van Vliet-Reedijk*; A. Wach; M. Yamazaki*. □ 



Measurements of elastic 
anisotropy due to solidification 
texturing and the implications for 
the Earth's inner core 

Michael I. Bergman 

Nature 389, 60-63 (1997) 

Owing to a typographical error, this Letter appeared under the title 
"Measurements of electric anisotropy due to solidification texturing 
and the implications for the Earth's inner core". The word 'elastic' in 
the first line was erroneously replaced with 'electric'. □ 
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The complete genome sequence of 
the gastric pathogen Helicobacter 
pylori 

Jean-F. Tomb, Owen White, Anthony R. Kerlavage, 

Rebecca A. Clayton, Granger G. Sutton, 

Robert D. Fleischmann, Karen A. Ketchum, 

Hans Peter Klenk, Steven Gill, Brian A. Dougherty, 

Karen Nelson, John Quackenbush, Lixin Zhou, 

Ewen F. Kirkness, Scott Peterson, Brendan Loft us, 

Delwood Richardson, Robert Dodson, 

Hanif G. Khalak, Anna Glodek, Keith McKenney, 

Lisa M. Fitzegerald, Norman Lee, Mark D. Adams, 

Erin K. Hickey, Douglas E. Berg, Jeanine D. Gocayne, 

Teresa R. Utterback, Jeremy D. Peterson, 

Jenny M. Kelley, Matthew D. Cotton, 

Janice M. Weidman, Claire Fujii, Cheryl Bowman, 

Larry Watthey, Erik Wallin, William S. Hayes, 

Mark Borodovsky, Peter D. Karp, Hamilton O. Smith, 

Claire M. Fraser & J. Craig Venter 

Nature 388, 539-547 (1997) 

In this Article, we incorrectly stated that the amino acids lysine and 
arginine are twice as abundant in H. pylori proteins as they are in 
those of Haemophilus influenzae and Escherichia coli. This statement 
was derived from amino-acid analyses that compared absolute 
differences in abundance, but these do not reflect the frequencies 
with which amino acids are found in the organisms in question. The 
actual abundance of arginine in H. pylori, H. influenzae and E. coli is 
3.5, 4.5 and 5.5%, respectively; the abundance of lysine in these 
organisms is 8.9, 6.3 and 4.4%, respectively. This oversight is 
particularly unfortunate because Russell H. Doolittle, who wrote 
an accompanying News and Views on our Article and brought this 
to our attention, was led to comment on the significance of our 
inaccurate observation. We regret this and any other misunder- 
standing that our error may have caused. □ 
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Helicobacter pylori, strain 26695, has a circular genome of 1,667,867 base pairs and 1,590 predicted coding 
sequences. Sequence analysis indicates that H. pylori has well-developed systems for motility, for scavenging iron, 
and for DNA restriction and modification. Many putative adhesins, lipoproteins and other outer membrane proteins 
were identified, underscoring the potential complexity of host-pathogen interaction. Based on the large number of 
sequence-related genes encoding outer membrane proteins and the presence of homopolymeric tracts and 
dinucleotide repeats in coding sequences, H. pylori, like several other mucosal pathogens, probably uses 
recombination and slipped-strand mispairing within repeats as mechanisms for antigenic variation and adaptive 
evolution. Consistent with its restricted niche, H. pylori has a few regulatory networks, and a limited metabolic 
repertoire and biosynthetic capacity. Its survival in acid conditions depends, in part, on its ability to establish a positive 
inside-membrane potential in low pH. 

For most of this century the cause of peptic ulcer disease was 
thought to be stress-related and the disease to be prevalent in 
hyperacid producers. The discovery 1 that Helicobacter pylori was 
associated with gastric inflammation and peptic ulcer disease was 
initially met with scepticism. However, this discovery and sub- 
sequent studies on H. pylori have revolutionized our view of the 
gastric environment, the diseases associated with it, and the 

Helicobacter pylori is a micro-aerophilic, Gram-negative, slow- 
growing, spiral-shaped and flagellated organism. Its most charac- 
teristic enzyme is a potent multisubunit urease 3 that is crucial for its 
survival at acidic pH and for its successful colonization of the gastric 
environment, a site that few other microbes can colonize 2 . H. pylori 
is probably the most common chronic bacterial infection of 
humans, present in almost half of the world population 2 . The 
presence of the bacterium in the gastric mucosa is associated with 
chronic active gastritis and is implicated in more severe gastric 
diseases, including chronic atrophic gastritis (a precursor of gastric 
carcinomas), peptic ulceration and mucosa-associated lymphoid 
tissue lymphomas 2 . Disease outcome depends on many factors, 
including bacterial genotype, and host physiology, genotype and 
dietary habits 4,5 . H. pylori infection has also been associated with 
persistent diarrhoea and increased susceptibility to other infectious 
diseases 6 . 

Because of its importance as a human pathogen, our interest in its 
biology and evolution, and the value of complete genome sequence 
information for drug discovery and vaccine development, we have 



Table 1 Genome features 



Coding regions (91.0%) 
Stable RNA(0.7%) 
Non-coding repeats (2.3%) 
Intergenic sequence (6.0%) 



Ribosomal RNA 
23S-5S 
23S-5S 



Coordinates 
445,306-448,642 bp 
1,473,557-1,473,919 bp 
1,209,082-1,207,584 bp 
1,511,138-1, 512, 635bp 
448,041 -448,618 bp 



629,845-630,124 bp 



)n 1 (33%G + C) 452-479 kb 
)n2(35%G + C) 539-579 kb 
)n3(33%G + C) 1,049-1,071 kb 
)n4(43%G + C) 1,264-1,276 kb 
)n5(33%G + C) 1,590-1,602 kt 



Coding sequen 
1,590 coding 
1,091 identified 



Associated genes 
IS605, 5SRNAand re 
cag PAI (Fig. 4) 
IS605, 5SRNAand re 
p and [V RNA polymi 
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sequenced the genome of a representative H. pylori strain by the 
whole-genome random sequencing method as described for 
Haemophilus influenzae 7 , Mycoplasma genitaliunf and 
Methanococcus jannaschi? . 

General features of the genome 

Genome analysis. The genome of H. pylori strain 26695 consists of a 
circular chromosome with a size of 1,667,867 base pairs (bp) and 
average G + C content of 39% (Figs 1 and 2). Five regions within the 
genome have a significantly different G + C composition (Table 1 
and Fig. 1 ) . Two of them contain one or more copies of the insertion 
sequence IS605 (see below) and are flanked by a 5S ribosomal RNA 
sequence at one end and a 521 bp repeat (repeat 7) near the other. 
These two regions are also notable because they contain genes 
involved in DNA processing and one contains 2 orthologues of 
the virB4/ptl gene, the product of which is required for the transfer 
of oncogenic T-DNA of Agrobacterium and the secretion of the 
pertussis toxin by Bordetella pertussis 10 . Another region is the cag 
pathogenicity island (PAI), which is flanked by31-bp direct repeats, 
and appears to be the product of lateral transfer 11 . 
RNA and repeat elements. Thirty-six tRNA species were identified 
using tRNAscan-SE 12 . These are organized into 7 clusters plus 12 
single genes. Two separate sets of 23S-5S and 16S ribosomal RNA 
(rRNA) genes were identified, along with one orphan 5S gene and 
one structural RNA gene (Table 1). Associated with each of the two 
23S-5S gene clusters is a 6-kilobase (kb) repeat containing a 
possible operon of 5 ORFs that have no database matches. 

Eight repeat families (>97% identity) varying in length from 0.47 
to 3.8 kb were found in the chromosome (Figs 1 and 2). Members of 
repeat 7 are found in intergenic regions, while the: others are 
associated with coding sequences and may represent gene duplica- 
tions. Repeats 1, 2, 3 and 6 are associated with genes that encode 
outer-membrane proteins (OMP) (Fig. 3). 

Two distinct insertion sequence (IS) elements are present. There 
are five full-length copies of the previously described IS605 11 ' 13 and 
two of a newly discovered element designated IS606. In addition, 
there are eight partial copies of IS605 and two partial copies of 
IS606. Both elements encode two divergently transcribed transpo- 
sases (TnpA and TnpB). IS606 has less than 50% nucleotide identity 
with IS605 and the IS606 transposases have 29% amino-acid 
identity with their IS605 counterpart. Both copies of the IS606 
TnpB may be mm functional owing to frameshifts. 
Origin of replication. As a lypical eubacterial origin of replication 
was not identified 14 , we arbitrarily designated basepair one at the 
start of a 7-mer repeat, (AGTGATT) 26 , that produces translational 
stops in all reading frames, as this repeated DNA is unlikely to 
contain any coding sequence. 

Open reading frames. One thousand five hundred and ninety 
predicted coding sequences were identified. They were searched 
against a non-redundant protein database resulting in 1,091 puta- 
tive identifications that were assigned biological roles using a 
classification system adapted from Riley 15 (Table 2). The 1,590 
predicted genes had an average size of 945 bp, similar to that 
observed in other prokaryotes 7-9 , and no genome-wide strand 
bias was observed (Fig. 2). More than 70% of the predicted proteins 
in H. pylori have a calculated isoelectric point (pi) greater than 7.0, 
compared to —40% in H. influenzae and E. coli. The basic amino 
acids, arginine and lysine, occur twice as frequently in H. pylori 
proteins as in those of H. influenzae and E. coli, perhaps reflecting an 
adaptation of H. pylori to gastric acidity. 

Paralagous families. Ninety-five paralogous gene families com- 
prising 266 gene products (16% of the total) were identified 
(www.tigr.org/tdb/mdb/hpdb/hpdb.html). Of these, 67 (173 
proteins) have an assigned role. Sixty-four have only 2 members, 
while the porin/adhesin-like outer membrane protein family (Fig. 2) 
is the largest with 32 members. The largest number of paralogues 
with assigned roles fall into the functional categories of cell 



envelope, transport and binding proteins, and proteins involved 
in replication. The large number of cell envelope proteins might 
reflect either a reduced biosynthetic capacity or a need to adapt to 
the challenging gastric environment. 

Cell division and protein secretion 

The gene content of H. pylori suggests that the basic mechanisms of 
replication, cell division and secretion are similar to those of E. coli: 
and H. influenzae. However, important differences are noted. For 
example, apparently missing from the H. pylori genome: are: ortho- 
logues of DnaC, MinC, and the secretory chaperonin, SecB. In oriC- 
type primosome formation, the DnaB and DnaC proteins form a B- 
C complex that delivers the DnaB helicase: to the developing 
primosome complex 16 . The apparent absence of DnaC in H. pylori 
suggests that either a novel mechanism for recruiting DnaB exists or 
a DnaC orthologue with no detectable : sequence similarity is 
present. Similar arguments can be made for other seemingly missing 
important functions. 

H. pylori has a classical set of bacterial chaperones (DnaK, DnaJ, 
CbpA, GrpE, GroEE,; GroES, and HtpG). The transcriptional 
regulation of H. pylori chaperone genes is likely to be different 
from that in £. coli, as it seems not to have the sigma factors that 
upregulate chaperone synthesis in E. coli (heat-shock sigma 32 and 
stationary-phase sigma S). 

In: addition to the SecA-dependent secretory pathway, H. pylori 

has two specialized export systems. One is associated with the cag 
pathogenicity island 11 and the other is the flagellar export pathway 
iwhich is assembled from orthologues of FliH, Flil, FliP, FlhA, FlhB, 
FliQ, FliR and FliP 17 . Apparently absent from H. pylori is a type IV 
signal peptidase and orthologues of the dsbABC system, which in 
other species are required for the maturation of pili and pilin-like 
structures 18 and assembly of surface structures involved in virulence 
and DNA transformation 19 . 

Recombination, repair and restriction systems 

Systems for homologous recombination and post-replication, mis- 
match, excision and transcription-coupled repair appear to be 
present in H. pylori. Also present are genes with similarity to 
DNA glycosylases which have associated AP endonuclease activity. 
The RecBCD pathway, which mediates homologous recombination 
and double-strand break repair, and RecT and RecE orthologues, 
proteins involved in strand exchange during recombination 20 , seem 
to be absent. The ability of H. pylori to perform mismatch repair is 
suggested by the presence of methyl transferases, mutS and uvrD. 
However, orthologues of MutH and MutL were not identified. 
Components of an SOS system also appear to be absent. 

Bacteria commonly use restriction and modification systems to 
degrade foreign DNA. In H. pylori, this defence system is well 
developed with eleven restriction-modification systems identified 
on the basis of gene order and similarity to endonucleases, methyl- 
transferases, and specificity subunits. Three type I, one type II, and 
three type IIS systems were identified, as well as four type III 
systems, including the recently identified epithelial responsive 



Figure 1 Linear representation of the/-/, pylori 26695 chromosome illustrating the ^■ 
location of each predicted protein-coding region, RNA gene, and repeat elements 
in the genome. Symbols are as follows: ++, Co 2 *, Zn 2 *, Cd 2 *; ?, unknown; A/G/S, 
D-alanine/glycine/o-serine; B12, B12/ferric siderophores; E, glutamate; Mo, 
molybdenum; R proline; P/G, proline/glycine betaine; Q, glutamine; S, 
serine; a-k, a-ketoglutarate; a/o, arginine/ornithine; aa, amino acids (specificity 
unknown); aa2, dipeptides; aaX, oligopeptides; fum, fumarate, succinate; glu, 
glucose/galactose; h, hemin; lac, L-lactate; mal, malate 2-oxoglutarate; nic, 
nicotinamide mononucleotides; pyr, pyrimidine nucleosides. Numbers asso- 
ciated with tRNA symbols represent the number of tRNAs at a locus. Numbers 
associated with GES represent the number of membrane-spanning domains 
according to the Goldman, Engelman and Steitz scale ascalculated byTopPred 47 . 
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endonuclease, iceAl, and its associated DNA adenine methyltrans- 
ferase (M. HypI) genes 21 ' 22 . In addition to the complete systems, 
seven adenine-specific, and four cytosine-specific methyltrans- 
ferases, and one of unknown specificity were found. Each of these 
has an adjacent gene with no database match, suggesting that they 
may function as part of restriction-modification systems. 

Transcription and translation 

Although analysis of gene content suggests that H. pylori has a basic 
transcriptional and translational machinery similar to that of E. coli, 
interesting differences are observed. For example, no genes for a 
catalytic activity in tRNA maturation (rnd, rph, or rnpB) were 
identified and of the three known ribonucleases involved in mRNA 
degradation, only polyribonucleotide phosphorylase was found. 
Twenty-one genes coding for 18 of the 20 tRNA synthetases nor- 
mally required for protein biosynthesis were found. 

As in most other completely sequenced bacterial genomes, the 
gene for glutaminyl-tRNA synthetase, glnS, is missing, and the 
existence of a transamidation process is assumed. It is also possible 
that the product of the second glutamyl -tRNA synthetase gene, gltX, 
present in H. pylori, may have acquired the glutaminyl-tRNA 
synthetase function. H. pylori provides the first example of a 
bacterial genome apparently lacking an asparaginyl-tRNA synthe- 
tase gene, asnS. A transamidation process to form Asn-tRNAAsn 
from Asp-tRNAAsn has been reported for the archaeon Haloferax 
volcanii 22 and may also operate in H. pylori. Most intriguing, 
however, is the finding that in H. pylori the genes encoding the (J 
and P' subunits of RNA polymerase are fused. In all studied 
prokaryotes the two genes are contiguous, but separate, and are 
part of the same transcriptional unit. Whether this gene fusion in H. 
pylori results in a fused protein, or whether the transcriptional or 
translational product of the fusion is subject to splicing, is currently 
not known. It is worth noting that an artificial fusion of the E. coli 



rpoB and rpoC genes is viable and results in a transcriptional 
complex, which has the same stoichiometry as the native complex 
(K. Severinov, personal communication). 

Adhesion and adaptive antigenic variation 

Most pathogens show tropism to specific tissues or cell types and 
often use several adherence mechanisms for successful attachment. 
H. pylori may use at least five different adhesins to attach to gastric 
epithelial cells 5 . One of them, HpaA (HP0797), was previously 
identified as a lipoprotein in the flagellar sheath and outer 
membrane 5 ' 23 . In addition to the HpaA orthologue, we have: identi- 
fied 19 other lipoproteins. Few have an identifiable function, but 
some are likely to contribute to the adherence capacity of the 
organism. 

Two adhesins 24 " 26 , one of which mediates attachment to the 
Lewis histo-blood group antigens, belong to the large family of 
outer membrane proteins OMI' l is;. 3) (T. Boren and R. Haas, 
personal communication). It is conceivable that other members of 
these closely related proteins: also act as adhesins. Given the large 
number of sequence-related genes encoding putative surface- 
exposed proteins, the potential exists for recombinational events 
leading to mosaic organization. This could be the basis for antigenic 
variation in H. pylori and an effective mechanism for host defence 
evasion, as seen in M. genitalium" . 

; At least one other mechanism for antigenic variation could 
operate in H. pylori. The DNA sequence at the beginning of eight 
genes, including five members of the OMP family, contain stretches 
of CTor AG dinucleotide repeats (Table 3a). In addition, poly(C) or 
poly(G) tracts occur within the coding sequence of nine other genes 
(Table 3b). Slipped-strand mispairing within such repeats are 
documented features of one mechanism of genotypic variation 28 ' 29 . 
These mechanisms may have evolved in bacterial pathogens to 
increase the frequency of phenotypic variation in genes involved in 



Figure 2 Circular representation of 




900,000 '800000 
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Figure 3 Multiple sequence alignment of members of the outer membrane 
protein family of H. pylori. These proteins were identified as OMPs based on the 
characteristic alternating hydrophobic residues at their carboxy termini. All 
members of this family have one domain of similarity at the amino-terminal end 
and seven domains of similarity at their carboxy-terminal end. Note that the first 11 
of these OMPs share extensive similarity over their entire length. Four of the 
OMPs were identified as porins (Hops) based on identity to published amino- 
terminal sequences, represented at the top of the alignment 50 . The most likely 




candidate for HopD is HP0913, which has 15 matches to the first 20-residue N- 
terminal peptide sequence 50 . These differences may be due to strain variability. 
The program Signal-P 48 was used to identify cleavage sites and signal peptides 
(underlined). Four of the OMPs haveTTG start codons (HP1156, FIP0252, HP1113, 
HP0796). Numbers embedded in the sequences represent amino acids omitted 
from the alignment. The star symbols indicate that FIP722, HP725 and HP9 
proteins contain a frameshift in their signal-peptide-coding region. These frame- 
shifts are associated with the presence of dinucleotide repeats (Table 3). 
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critical interactions with their hosts 28 . Such 'contingency' genes 
encode surface structures like pilins, lipoproteins or enzymes that 
produce lipopolysaccharide molecules 28 . Our analysis suggests that 
the seventeen genes reported in Table 3a,b belong to this category 
and thus may provide an example of adaptive evolution in H. pylori. 

Phenotypic variation at the transcriptional level may also operate 
in H. pylori. Examples of repetitive DNA mediating transcriptional 
control have been documented by the presence of oligonucleotide 
repeats in promoter regions 29 . Homopolymeric tracts of A or T in 
potential promoter regions of eighteen genes were found, including 
eight members of the OMP family (Table 3c). 

Virulence 

The virulence of individual H. pylori isolates has been measured by 
their ability to produce a cytotoxin-associated protein (CagA) and 



an active vacuolating cyto toxin (VacA) 5 . The cagA gene, though not 
a virulence determinant, is positioned at one end of a pathogenecity 
island containing genes that elicit the production of interleukin 
(IL) -8 by gastric epithelial cells 11 ' 30 . Consistent with its more virulent 
character, H. pylori strain 26695 contains a single contiguous PAI 
region 11 (Fig. 4). 

VacA induces the formation of acidic vacuoles in host epithelial 
cells, and its presence is associated epidemiologically with tissue 
damage and disease 31 . VacA may not be the only ulcer-causing factor 
as 40% of H. pylori strains do not produce detectable amounts of the 
cytotoxin in vitro 5 . Sequence differences at the amino terminus and 
central sections are noted among VacA proteins derived from Tox + 
and Tox~ strains 31 . This Tox + H. pylori strain; contains the more 
toxigenic SI a/ml type cytotoxin and three additional large proteins 
with moderate similarities to the carboxy-terminal end of the active 




Figure 4 Comparison between the Cag pathogenicity islands of the sequenced 
strain, 26695 and the NCTC11638 strain. The twenty njhe : OBFs of the contiguous 
PAI in strain 26695 are represented together with the : corfesponding ORFs from 
the PAI present in NCTC11638 (AC000108 ah;d:IJB0176) : : The PAI in NCTC11638 is 
divided by the IS 605 elements into ^.regions, cagl and cagll. The PAI in 
NCTC11638 isflanked by a 31 -bp (TTACaXtt1"GAGCCCATTCTTTAGCTTGTTTT) 
direct repeat (vertical arrowsjiaS described 11 . Some of the genes encode proteins 
with similarity to proteins involved either in DNA transfer (Vir and Tra proteins) or in 
export of a toxin (Ptl profeiri) 10 . However, these genes do not have the conserved 
contiguous arrangerrieht found in the VirB, Tra and Ptl operons, suggesting that 
this PAI is not derived from these systems. Most genes of the PAI have no 
database rrtatch; contrary to a previous suggestion 1 ! Thirteen of the proteins have 
a Signal peptide (squiggle line), three of them with a weaker probability (squiggled 
Wmf^j, .the average length of the signal peptides is 25 amino acids, suggesting 
that this PAI is of Gram-negative origin. Eight proteins are predicted to have at 



(IM) 47 . Although the two PAI are ~ 
several notable and perhaps biologically releva 
sequences. Four of the genes differ in size. In the PAI of strain 26695, HP 520 and 
521 are shorter, whereas HP523 is longer, and HP 527 actually spans both ORF13 
and 14. In addition, the N-terminal part of HP527 is 129 amino acids longerthan the 
corresponding region in ORF14. HP548/549 contains a frameshift and is therefore 
probably inactive in strain 26695. The stippled box preceding ORF13 represents 
an N-terminal extension not annotated in the Genbank entry for the PAI of 
NCTC11638. The Y indicates ORFs that are neither GeneMark-positive nor 
GeneSmith-positive, so were not included in our gene list. However, these 
ORFs may be biologically significant. We do not represent cagR as an ORF, 
because it is completely contained within ORFQ, and is GeneMark-negative. 
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cytotoxin (—26-31%) (Fig. 5). However, they lack the paired- 
cysteine residues and the cleavage site required for release of the 
VacA toxin from the bacterial membrane 31 (Fig. 5). We propose that 
these proteins may be retained on the outside surface of the cell 
membrane and contribute to the interaction between H. pylori and 
host cells. 

The surface-exposed lipopolysaccharide (LPS) molecule plays an 
important role in H. pylori pathogenesis 32 . The LPS of H. pylori is 
several orders of magnitude less immunogenic than that of enteric 
bacteria 33 and the O antigen of many H. pylori isolates is known to 
mimic the human Lewis x and Lewis 7 blood group antigen 32 . Genes 
for synthesis of the lipid A molecule, the core region, and the O 
antigen were identified. Two genes with low similarity to fucosyl- 
transferases (HP379, HP651) were found and may play a role in the 
LPS -Lewis antigen molecular mimicry. Our analysis also suggests 
that three genes, two glycosyltransferases (HP208 and HP619) and 
one fucosyltransferase (HP379), may be subject to phase variation 
(Table 3a, b). 

As with other pathogens, H. pylori probably requires an iron- 
scavenging system for survival in the host 5 . Genome analysis 
suggests that H. pylori has several systems for iron uptake. One is 
analogous to the siderophore-mediated iron-uptake fee system of E. 
coli M , except that it lacks the two regulatory proteins (FecR and Feci) 
and is not organized in a single operon. Unlike other studied 
systems, H. pylori has three copies of each oifecA, exbB and exbD. 
A second system, consisting of a /eoB-like gene without feoA, 
suggests that H. pylori can assimilate ferrous iron in a fashion 
similar to the anaerobic feo system of E. coli. Other systems for iron 
uptake present in H. pylori consist of the three frpB genes which 
encode proteins similar to either haem- or lactoferrin-binding 
proteins. Finally, H. pylori contains NapA, a bacterioferritin 34 , and 
Pfr, a non-haem cytoplasmic iron-containing ferritin used for 
storage of iron 35 . The global ferric uptake regulator ( Fur) character- 
ized in other bacteria is also present in : Hi pylori. Consensus 



sequences for Fur-binding boxes were found upstream of two fecA 
genes, the three frnB genes and fur. 

H. pylori motility is essential for colonization 36 . It enables the 
bacterium to spread into the viscous mucous layer covering the 
gastric epithelium. At least forty proteins in the H. pylori genome 
appear to be involved in the regulation, secretion and assembly of 
the flagellar architecture. As has bene reported for the flaA and flaB 
genes, we identified sigma 28 and sigma 54-like promoter elements: 
upstream of many flagellar genes, underscoring the complexity of : 
the transcriptional regulation of the flagellar regulon 5 . 

Acidity, pH and acid tolerance 

H. pylori is unusual among pathogenic bacteria in its ability to 
colonize host cells in an environment of high: acidity. As it enters the 
gastric environment by oral ingestion, the organism is transiently 
subjected to the extreme pH of the lumen side of the gastric i 
layer (pH ~2). The survival of if. pylori in ac 
probably due to its ability to establish a positive inside-membrane 
potential 37 and subsequently : to modify its microenvironment 
through the action of urease and the release of factors that inhibit 
acid production by parietal cells 5 . A switch in membrane polarity 
provides an electrical barrier that prevents the entry of protons 
(H + ) . A positive cell interior can be created by the active extrusion of 
anions or by a proton diffusion potential. The latter model appears 
more likely as no clear mechanism for electrogenic anion efflux is 
apparent in the genome. A proton diffusion potential would require 
the anion permeability of the cytoplasmic membrane to be low and, 
thus far, only three anion transporters have been identified. How- 
ever, it remains to be determined whether anion conductances are 
associated with other proteins: the MDR-like transporters (HP600, 
HP1082 and HP1206) or hypothetical. Although it has been 
suggested that proton-translocating P-type ATPases could mediate 
survival in acid conditions by the extrusion of protons from the 
cytoplasm 38 , this idea is not supported by the identified transporter 



Table 3 Homopolymeric tracts and dinucleotide repeats in H. pylori 
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genes. The P-type ATPase sequences in H. pylori (copAP, HP791, and 
HP1503) are more closely related to divalent cation transporters 
than to ATPases with specificity for protons or monovalent cations. 
One of them, HP0791, is involved in Ni + supply, an essential 
component of urease activity 39 . The others may be involved in the 
elimination of toxic metals from the cytoplasm and not in pH 
regulation. 



articles 



Additional mechanisms of pH homeostasis may well contribute 
to H. pylori survival. A change in protein content observed in 
response to a shift of extracellular pH from 7.5 to 3.0 suggests the 
presence of an acid-inducible response 40 . Although H. pylori lacks 
most orthologues of the genes that are acid-induced in E. coli and 
Salmonella typhimurium, including the amino-acid decarboxylases 
and formate hydrogen lyase, certain virulence factors, outer membrane 
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proteins, sensor-regulator pairs and other proteins may be acid- 
induced. 

Regulation of gene expression 

Bacteria regulate the transcription of their genes in response to 
many environmental stimuli, such as nutrient availability, cell 
density, pH, contact with target tissue, DNA-damaging agents, 
temperature and osmolarity. In the case of pathogens, the regulated 
expression of certain key genes is essential for successful evasion of 
host responses and colonization, adaptation to different body sites, 
and survival as the pathogen passes to new hosts. In H. pylori, global 
regulatory proteins are less abundant than in E. coli. For example, 
orthologues of many DNA-binding proteins that regulate the 
expression of certain operons such as OxyR (oxidative stress), Crp 
(carbon utilization), RpoH (heat shock), and Fnr (fumarate and 
nitrate regulation) are absent. Only four H. pylori proteins have a 
perfect match to helix-turn-helix (HTH) motifs, a signature of 
transcription factors; a putative heat-shock protein (HspR), two 
proteins with no database match (HP1124 and HP1349) and SecA, a 
component of the general secretory machinery. In contrast, 34 
proteins containing an HTH motif were found in H. influenzae 
and 148 in E. coli. We identified several other putative regulatory 
functions, including SpoT and CstA for 'stringent response' to 
amino-acid starvation and to carbon starvation, respectively. 

Environmental response requires sensing changes and transmission 
of this information to cellular regulatory networks. Two-component 
regulator systems, consisting of a membrane histidine kinase sensor 
protein and a cytoplasmic DNA-binding response regulator, provide a 
well studied mechanism for such signal transduction. Four sensor 
proteins and seven response regulators were found in H. pylori, 
similar to the number found in H. influenzae' . This is approximately 
one third the number found in E. coli which, in contrast to H. pylori 
and H. influenzae, may be exposed to more environments. 

Metabolism 

Metabolic pathway analysis of the H, pylori genome suggests the 
following features. H. pylori uses glucose as the only source of 
carbohydrate and the main source for substrate-level phosphoryla- 
tion. It also derives energy from the degradation of serine, alanine, 
aspartate and proline.: The glycolysis-gluconeogenesis metabolic 
axis constitutes the backbone of energy production and the start 
point of many biosynthetic pathways. The biosynthesis of peptido- 
glycan, phospholipids, aromatic amino acids, fatty acids and cofac- 
tors is derived from acetyl-CoA or from intermediates in the 
glycolytic pathway (Fig. 6). The metabolism of pyruvate reflects 
the microaerophilic character of this organism. Neither the aerobic 
pyruvate dehydrogenase (aceEF) nor the strictly anaerobic pyruvate 
formate lyase (pfl) associated with mixed-acid fermentation are 
present. The conversion of pyruvate to acetyl CoA is performed by 
the pyruvate ferrodoxin oxidoreductase (POR), a four-subunit 
enzyme thus far only described in hyperthermophilic organisms 41 . 
The tricarboxylic acid cycle (TCA) is incomplete and the glyoxylate 
shunt is absent. The analysis of degradative pathways, uptake 
systems and biosynthetic pathways for pyrimidine, purine and 
haem suggests that H. pylori uses several substrates as nitrogen 
source, including urea, ammonia, alanine, serine and glutamine. 
The assimilation of ammonia, an abundant product of urease 
activity, is achieved by the glutamine synthase enzyme and ct- 
ketoglutarate is transformed into glutamate by glutamate dehydro- 
genase rather than by the glutamate synthase enzyme. 

In H. pylori, proton translocation is mediated by the NDH-1 
dehydrogenase and the different cytochromes, including the 
primitive-type cytochrome cbb3 (Table 2). Four respiratory 
electron-generating deydrogenases have been identified, glycerol- 
3-phosphate dehydrogenase (GlpD), D-lactate dehydrogenase, 
NADH -ubiquinone oxidoreductase complex (NDH-1), and a 
hydrogenase complex (HydABC). Our analysis also suggests that 



H. pylori is not able to use nitrate, nitrite, dimethylsulphoxide, 
trimethylamine N-oxide or thiosulphate as electron acceptors. 
Much of our metabolic analysis is supported by experimental 
evidence 41 ' 42 . 

Evolutionary relationships of W. pylori 

H. pylori is currently classified in the Proteobacteria, a large, diverse: 
division of Gram-negative bacteria which includes two other: 
completely sequenced species, H. influenzae and E. coli. Given this 
taxonomic placement, based primarily on 16S rRNA sequence 
comparisons, one might expect the proteins of //. pylori more 
closely to resemble their H. influenzae and E. coli homologues 
rather than those in other genomes such as Synechocystis sp., M. 
genitalium, M. pneumoniae, M. jannaschii, and Saccharomyces 
cerevisae. This is indeed the case for many proteins. There are, 
however, many examples of H. pylori proteins in amino-acid 
biosynthesis, energy metabolism,: translation and cellular processes 
that have greater sequence similarity to those found in non- 
Proteobacteria. For example, : Dhsl, the initial enzyme in the 
chorismate biosynthesis pathway is 75.5% similar to Arabidopsis 
thaliana chloroplast Dhsl gene product, and has minimal sequence 
similarity to the: equivalent E. coli AroH, AroF or AroG gene 
products. The: remaining enzymes in this pathway have strong 
sequence similarity to their E. coli counterpart. Similarly, the H. 
pylori prephenate dehydrogenase (TyrA), which converts choris- 
mate to tyrosine, and six out of 15 enzymes in the aspartate amino 
acid biosynthetic pathways, resemble those from B. subtilis. A 
similar pattern can be seen in a different functional category. 
Nearly all H. pylori tRNA synthetases have eubacterial homologues, 
mostly with best matches to Proteobacteria species. However, 
histidyl-tRNA synthetase shows several amino-acid sequence sig- 
natures in common with eukaryotic and archaeal (M. jannaschii) 
homologues. 

Such observations of discordant sequence similarity are often 
interpreted as evidence of lateral gene transfer in the evolutionary 
history of an organism. It is also possible that H. pylori diverged 
early from the lineage that led to the gamma Proteobacteria, and 
retained more ancient forms of enzymes that have been subse- 
quently replaced or have diverged extensively in H. influenzae and 
E. coli. 

Conclusion 

Our whole-genome analysis of H. pylori gives new insight into its 
pathogenesis, acid tolerance, antigenic variation and microaerophi- 
lic character. The availability of the complete genome sequence will 
allow further assessment of H. pylori genetic diversity. This is an 
important aspect of H. pylori epidemiology as allelic polymorphism 
within several loci has already been associated with disease 
outcome 5,21,31 . The extent of molecular mimicry between H. pylori 
and its human host, an underappreciated topic, can now be fully 
explored 43 . The identification of many new putative virulence 
determinants should allow critical tests of their roles and thus 
new insight into mechanisms of initial colonization, persistence 
of this bacterium during long-term carriage, and the mechanisms 
by which it promotes various gastroduodenal diseases. 



Methods 

H. pylori strain 26695 (ref. 44) was originally isolated from a patient in the 
United Kingdom with gastritis (K. Eaton, personal communication) and was 
chosen because it colonizes piglets and elicits immune and inflammatory 
responses. It is also toxigenic, and transformable, and thus amenable to 
mutational tests of gene function. 

The H. pylori genome sequence was obtained by a whole-genome random 

Mycoplasma genitalium", and Methanococcus jannaschii 9 . Ninety-two per cent 
of the genome was covered by at least one X clone and only 0.56% of the 
genome had single-fold coverage. 
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Open reading frames (ORFs) and predicted coding regions were identified 
using three methods. The predicted protein-coding regions were initially 
defined by searching for ORFs longer than 80 codons. Coding potential analysis 
of the entire genome was performed with a version of GeneMark 45 trained with 
a set of H. pylori ORFs longer than 600 nucleotides. Coding sequences and 
potential starts of translation were also determined using GeneSmith (H.S., 
unpublished), a program that evaluates ORF length, separation of ORFs and 
overlap and quality of ribosome binding site. ORFs with low GeneMark coding 
potential, no database match, and not retained by GeneSmith were eliminated. 
GeneSmith identified 25 ORFs that are smaller than 100 codons, had no 
database match and were GeneMark negative. Frameshifts were detected by 
inspecting pairwise alignments, families of orthologues (similar proteins 
derived from different species) and paralogues (similar proteins from within 

dinucleotide repeats. Ambiguities were resolved by an alternative sequencing 
chemistry (terminator reactions), and by sequencing PCR products obtained 
using the genomic DNA as template. Frameshifts that remain in the genome are 
considered authentic and not sequencing artefacts. 

To determine their identity, ORFs were searched against a non-redundant 
amino-acid database as previously described 9 . ORFs were also analysed using 
175 hidden Markov models constructed for a number of conserved protein 
families (pfam vl.0) using hmmer 43 . In addition, all ORFs were searched 
against the prosite motif database using MacPattern 46 . Families of paralogues 
were constructed by pairwise searches of proteins using FASTA. Matches that 
spanned at least 60% of the smaller of the protein pair were retained and 
visually inspected. 

A unix version of the program TopPred 47 was used to identify membrane^ 
spanning domains (MSD) in proteins. Six hundred and sixty three proteins ; 
containing at least one MSD were found; of these, 300 had 2 potential MSFJs or 
more. The presence of signal peptides and the probable position of the cleavage 
site in secreted proteins were detected using Signal-P, a neural net program that 
had been trained on a curated set of secreted proteins: from Gram-negative 
bacteria 48 . 367 proteins were predicted to have a signal peptide. Lipoproteins 
were identified by scanning for the presence of a lipobox in the first 30 amino 
acids of every protein; 20 lipoproteins were identified, eighteen of which were 
Signal-P positive. Outer-membrane proteins were found by searching for 
aromatic amino acids at the end of the proteins. 

Homopolymer and dinucleotide repeats were found by using RepScan 
(H.O.S., unpublished) which finds direct repeats of any length. All features 
identified using these programs were validated by visual inspection to remove 
false positives. Metabolic pathways were curated by hand and by reference to 
EcoCyc 49 ..:, . 




12. http://genome.wustl.edu/eddv/low/tR 



;r.org/tdb/mdb/hpdb/hpdb.html 
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